GitHub Repository: debakarr/machinelearning
Path: blob/master/Part 9 - Dimension Reduction/Linear Discriminant Analysis/[R] Linear Discriminant Analysis.ipynb
¹³⁴¹ views

Kernel: R

Linear Discriminant Analysis

Data preprocessing

In [1]:

# Import the dataset
dataset = read.csv('Wine.csv')

In [2]:

head(dataset, 10)

Out[2]:

In [3]:

# Splitting the dataset into the Training set and Test set
library(caTools)
set.seed(42)
split = sample.split(dataset$Customer_Segment, SplitRatio = 0.8)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)

In [4]:

head(training_set, 10)

Out[4]:

In [5]:

head(test_set, 10)

Out[5]:

In [6]:

# Feature Scaling
training_set[-14] = scale(training_set[-14])
test_set[-14] = scale(test_set[-14])

In [7]:

head(training_set, 10)

Out[7]:

In [8]:

head(test_set, 10)

Out[8]:

Applying Linear Discriminant Analysis

In [9]:

library(MASS)

In [10]:

lda  = lda(formula = Customer_Segment ~ .,
        data = training_set)

training_set = as.data.frame(predict(lda, training_set))

In [11]:

head(training_set, 10)

Out[11]:

In [12]:

training_set = training_set[c(5, 6, 1)] # Reordering the columns

In [13]:

head(training_set, 10)

Out[13]:

In [14]:

test_set = as.data.frame(predict(lda, test_set))
test_set = test_set[c(5, 6, 1)] # Reordering the columns

In [15]:

head(test_set, 10)

Out[15]:

Fitting classifier to the Training set

In [16]:

library(e1071)
classifier = svm(formula = class ~ ., 
                 data = training_set, 
                 type = 'C-classification', 
                 kernel = 'radial')

Predicting the Test set results

In [17]:

y_pred = predict(classifier, newdata = test_set[-3])

In [18]:

head(y_pred, 10)

Out[18]:

In [19]:

head(test_set[3], 10)

Out[19]:

Making the Confusion Matrix

In [20]:

cm = table(test_set[, 3], y_pred)

In [21]:

cm

Out[21]:

classifier made 12 + 14 + 9 = 35 correct prediction and 1 incoreect prediction.

Visualizing the Training set results

In [22]:

# install.packages('ElemStatLearn')
library(ElemStatLearn)

In [23]:

set = training_set

In [24]:

X1 = seq(min(set[, 1]) - 1, max(set[, 1]) + 1, by = 0.01)
X2 = seq(min(set[, 2]) - 1, max(set[, 2]) + 1, by = 0.01)
grid_set = expand.grid(X1, X2)
colnames(grid_set) = c('x.LD1', 'x.LD2')
y_grid = predict(classifier, newdata = grid_set)
plot(set[, -3],
     main = 'Kernel SVM (Training set)',
     xlab = '1st Linear Discriminant Component', ylab = '2nd Linear Discriminant Component',
     xlim = range(X1), ylim = range(X2))
contour(X1, X2, matrix(as.numeric(y_grid), length(X1), length(X2)), add = TRUE)
points(grid_set, pch = '.', col = ifelse(y_grid == 2, 'lightblue', ifelse(y_grid == 1, 'springgreen3', 'tomato')))
points(set, pch = 21, bg = ifelse(set[, 3] == 2, 'blue3', ifelse(set[, 3] == 1, 'green4', 'red3')), col='white')
legend("topright", legend = c("0", "1", "2"), pch = 16, col = c('red3', 'green4', 'blue3'))

Out[24]:

Visualizing the Test set results

In [25]:

set = test_set

In [26]:

X1 = seq(min(set[, 1]) - 1, max(set[, 1]) + 1, by = 0.01)
X2 = seq(min(set[, 2]) - 1, max(set[, 2]) + 1, by = 0.01)
grid_set = expand.grid(X1, X2)
colnames(grid_set) = c('x.LD1', 'x.LD2')
y_grid = predict(classifier, newdata = grid_set)
plot(set[, -3],
     main = 'Kernel SVM (Test set)',
     xlab = '1st Linear Discriminant Component', ylab = '2nd Linear Discriminant Component',
     xlim = range(X1), ylim = range(X2))
contour(X1, X2, matrix(as.numeric(y_grid), length(X1), length(X2)), add = TRUE)
points(grid_set, pch = '.', col = ifelse(y_grid == 2, 'lightblue', ifelse(y_grid == 1, 'springgreen3', 'tomato')))
points(set, pch = 21, bg = ifelse(set[, 3] == 2, 'blue3', ifelse(set[, 3] == 1, 'green4', 'red3')), col='white')
legend("topright", legend = c("0", "1", "2"), pch = 16, col = c('red3', 'green4', 'blue3'))

Out[26]:

Linear Discriminant Analysis

Data preprocessing

Applying Linear Discriminant Analysis

Fitting classifier to the Training set

Predicting the Test set results

Making the Confusion Matrix

Visualizing the Training set results

Visualizing the Test set results

Product

Resources

Company